Domain term relevance through tf-dcf
نویسندگان
چکیده
This paper proposes a new index for the relevance of terms extracted from domain corpora. We call it term frequency, disjoint corpora frequency (tf-dcf ), and it is based on the absolute term frequency of each term tempered by its frequency in other (contrasting) corpora. Conceptual differences and mathematical computation of the proposed index are discussed in respect with other similar approaches that also take the frequency in contrasting corpora into account. To illustrate the efficiency of the tf-dcf index, this paper evaluates the application of this index and other similar approaches.
منابع مشابه
Abordagens para Estimar Relevância de Relações Não-Taxonômicas Extraídas de Corpus de Domínio
This paper performs a comparison between two approaches to weight the relevance of extracted non-taxonomic relations found in domain corpora. The first approach computes the relevance according to the verb absolute frequency. The second approach computes the relevance according to the verb frequency and uniqueness in each corpus using tf-dcf relevance index, an index that takes into account the...
متن کاملUsing TF-IDF to Determine Word Relevance in Document Queries
In this paper, we examine the results of applying Term Frequency Inverse Document Frequency (TF-IDF) to determine what words in a corpus of documents might be more favorable to use in a query. As the term implies, TF-IDF calculates values for each word in a document through an inverse proportion of the frequency of the word in a particular document to the percentage of documents the word appear...
متن کاملA Novel Term_Class Relevance Measure for Text Categorization
In this paper, we introduce a new measure called Term_Class relevance to compute the relevancy of a term in classifying a document into a particular class. The proposed measure estimates the degree of relevance of a given term, in placing an unlabeled document to be a member of a known class, as a product of Class_Term weight and Class_Term density; where the Class_Term weight is the ratio of t...
متن کاملComparative Analysis of IDF Methods to Determine Word Relevance in Web Document
Inverse document frequency (IDF) is one of the most useful and widely used concepts in information retrieval. When it is used in combination with the term frequency (TF), the result is a very effective term weighting scheme (TF-IDF) that has been applied in information retrieval to determine the weight of the terms. Terms with high TF-IDF values imply a strong relationship with the document the...
متن کاملLearning Global Term Weights for Content-based Recommender Systems
Recommender systems typically leverage two types of signals to effectively recommend items to users: user activities and content matching between user and item profiles, and recommendation models in literature are usually categorized into collaborative filtering models, content-based models and hybrid models. In practice, when rich profiles about users and items are available, and user activiti...
متن کامل